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STATISTICAL ESTIMATION IN THE PROPORTIONAL HAZARDS 
MODEL WITH RISK SET SAMPLING 1 

By Kani Chen 

Hong Kong University of Science and Technology 

Thomas' partial likelihood estimator of regression parameters is 
widely used in the analysis of nested case-control data with Cox's 
model. This paper proposes a new estimator of the regression param- 
eters, which is consistent and asymptotically normal. Its asymptotic 
variance is smaller than that of Thomas' estimator away from the 
null. Unlike some other existing estimators, the proposed estimator 
does not rely on any more data than strictly necessary for Thomas' 
estimator and is easily computable from a closed form estimating 
equation with a unique solution. The variance estimation is obtained 
as minus the inverse of the derivative of the estimating function and 
therefore the inference is easily available. A numerical example is 
provided in support of the theory. 

1. Introduction. Thomas' partial likelihood estimate [Thomas (1977) 
and Oakes (1981)] is the most popular estimate of regression parameters in 
nested case-control (n-c-c) studies using Cox's proportional hazards model. 
The partial likelihood score has a simple closed form expression and there- 
fore the estimate is computationally simple with easily available inference. 
More important, Thomas' estimate only relies on the time-restricted n-c-c 
data: the failure times of all cases and the covariates of controls (cases) at 
the time when they are sampled (fail). The aim of this paper is to propose 
a new estimate that uses only the time-restricted n-c-c data and is more 
accurate than Thomas' estimate away from the null. This estimate is also 
easy to compute with readily available inference. Throughout the paper, an 
estimate is said to be more accurate or efficient than another estimate if the 
former has smaller asymptotic variance. 
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Statistical analysis of n-c-c designs has attracted considerable attention 
in the past decade; see Langholz and Thomas (1990, 1991), Goldstein and 
Langholz (1992), Robins, Rotnitzky and Zhao (1994), Borgan, Goldstein and Langholz 
(1995), Langholz and Goldstein (1996), Breslow (1996), Samuelsen (1997), 
Suissa, Edwardes and Biovin (1998), Borgan and Olsen (1999) and Chen 
(2001), among many others. Some competing estimates were also studied 
in the literature; see Robins, Rotnitzky and Zhao (1994), Samuelsen (1997) 
and Chen (2001). However, all these studies depend on the extended n-c-c 
data which are more than strictly necessary for Thomas' estimate. The ex- 
tended n-c-c data is defined as the observed failure or censoring times and 
failure or censoring indices for all cohort members and the entire covari- 
ate histories for all cases and controls. Robins, Rotnitzky and Zhao (1994) 
pointed out that Thomas' estimator is not semiparametrically efficient based 
on the extended n-c-c data but they only dealt with time-fixed covari- 
ates. Samuelsen (1997) proposed an estimator via the inclusion probability 
method but it is not always more accurate than Thomas' estimator; see dis- 
cussion in Chen (2001). The estimation method of Chen (2001) leads to a 
semiparametrically efficient estimator but it inevitably involves estimating 
and inverting a Fredholm operator, which is computationally difficult. 

The most serious practical limitation of the estimators of Samuelsen 
(1997), Chen (2001) and Robins, Rotnitzky and Zhao (1994), in contrast 
with Thomas' estimator, is that they rely on the extended n-c-c data rather 
than only the time-restricted n-c-c data. The extended n-c-c data contain 
components that are often not available or, even if available, are much less 
reliable than the time-restricted n-c-c data. First, the nonfailures that are 
not sampled as controls are usually not closely followed up, and therefore 
their censoring times are often not accurately observed. This happens par- 
ticularly when the cohort is loosely defined [Chen and Lo (1999)]. Second, 
the ascertainment of the entire covariate histories for cases and controls is 
often too difficult a task to accomplish with reasonable accuracy. Thus, a 
new estimator would be greatly desirable if it uses only the time-restricted 
n-c-c data and is reasonably accurate. 

The next section introduces notation and Thomas' estimator based on the 
time-restricted n-c-c data. Section 3 presents the proposed estimator and its 
consistency and asymptotic normality. A numerical example is provided in 
Section 4. A few closing remarks are given in Section 5. Proofs are presented 
in the Appendix. 

2. Thomas' partial likelihood estimator. Let {T, C, Z(-)} be the random 
triplet of life time, censoring time and covariate process of dimension d. 
Let Y = min(T, C), 5 = I(T < C), N(t) = SI(T < t) and Y(t) = I(Y > t), 
where I is the indicator function throughout. Consider a cohort of size 
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n. Let Tj, Q, Zi(-),Yi, Si, Ni(-),Yi(-),i = 1, 2, . . . , n, be the i.i.d. sample ana- 
logues. The full cohort data refer to [(Yi, Si), {Zi(t) : t € [0, Yi]} : i = 1, . . . , n\. 
An n-c-c design takes a random sample of size m (for covariate ascertain- 
ment) from the risk set at every failure time, excluding the failed subject 
itself. Let Rl denote the index set of a size m random sample selected 
from all subjects with the minimum of failure and censoring times greater 
than t. The extended n-c-c data refer to {(Yi, Si) :i = 1, . . . , n} U [{Zi(t) : t £ 
[0, Y^}, {Zj (t):te [0, Yj]}:6i = l,jERY.,i,j = l,...,n]. The time-restricted 
n-c-c data refer to [{Yi, Zi(Yi), Zj(Yi)} :Si = l,jE R Y .,i,j = 1, . . . ,n]. With 
the time-restricted n-c-c data, the exact censoring times for the nonfailures 
are not necessarily specified and the covariate histories for controls (cases) 
are not observed except at the time when they are sampled (fail). 

The Cox proportional hazards model assumes that the conditional hazard 
of T given Z satisfies 

X T (t\Z = z) = X (t) exp{p f z(t)}, 

where (3 is the parameter to be estimated and Ao(-) is the baseline hazard 
function. The life time T and the censoring time C are always assumed con- 
ditionally independent given Z, which is assumed pathwise left continuous 
with right limit. Cox's partial likelihood estimator of (3 based on the full 
cohort data, denoted by (3c, is the solution of 

Y.^ Rt Z 3 (t)ex V {P'Z 3 {t)Y 



Uc(P)^J2 



Zi(t) 



dNi(t) = 0, 



where r = sup{t : pr(Y > t) > 0} and Rt is the risk set at time t; that is, Rt 



{j:Y j >t,j = l, 



by (3p, is the solution of 

U P {(3) 



E 



n}. Thomas' partial likelihood estimator of (3, denoted 
EjeR*u{i} Zji^expiP'Zjit)}' 



Zi(t) 



dNi(t) = 0. 



T,j£R* t U{i} eX P{P' Z j( t )} 

It is clear that Thomas' estimator uses only the time-restricted n-c-c data. 
It is proved in Goldstein and Langholz (1992) (where r is set to be 1 for 
convenience) that, under certain regularity conditions, 

n 1 / 2 P -f3)^N(O, 

where Sp = Sc — S a , 



^p 1 1 



■'C 



E[{Z(t) - /i(t)}® 2 eMP'Z(t)}Y(t)]X (t) dt, 

r -([ET^iiHt) - »(t)}eMP'z,(t)}r 2 



m + 1 



E 



Epi^iP'Ht)} 



Y >t, 



,Y m +i>t\ pr(Y>t)X (t)dt 
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and 

u(t) - Efzaw -ts-n- a[3(*)exp{/yz(t)}y(t)] 

Here and throughout the paper, v ® 2 = vv' for any vector v of dimension 
d. Moreover, — Up(v)\ v _&/n is a consistent estimator of £p. It is also well 

known that n l / 2 (f3 c - ft) -> A^O,!!^ 1 ). Throughout the paper, the true value 
of is still denoted by f3. 

Remark. Set 

m+l /m+1 

*(<)=X;({Z i (t)-M*)}exp{/3'Z / (t)})/ £ exp^ZjC*)} 
i=i 1 i=i 

for ease of notation. It follows from the expression of Ep given in Goldstein 
and Langholz (1992) that 



x Z x {t)}Y x {t)\Y 2 >t,.. .,Y m+1 > i)Ao(i) dt 

Jo 

x ex^{l3' Z x (t)}\Yx > i, . . . , Y m+l > t) pi(Y > t)X (t) dt 
i?([Z 1 (t)- / u(t)]® 2 exp{/3 / Z 1 (t)}y 1 (t))Ao(t)^ 

E{[z y {t) - 

x exp{/3%(i)}|ri >*,..., W > t) pr(y > t)A (t) dt 

E(*(t)[Mt)-tit)Y 

x exp{/?'Zi(t)}|Fi > t, . . . , Y m+1 > t) pr(y > t)X (t) dt 

+ T £W)® 2 exp{p'Z 1 (t)}\Y 1 >*,..., y m+1 > i) pr(Y > t)A (t) dt 

= Sc — S a — S a + S a = Sc — S a . 

The expression of Sp in the first line is intuitively well understandable from 
the partial likelihood nature of the Thomas estimator ftp. 
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3. A new estimator and its inference. The motivation of the new esti- 
mator is described in the following. Observe that the ratio in the expression 
of Up can be viewed as an estimator of n{t). This estimator, although un- 
biased, uses only m + 1 observations: m controls plus 1 case. Heuristically, 
its estimation and accuracy can be improved by utilizing more observations 
of relevance. One way to do so is to consider altogether the controls in R* 
for all failure times s in a neighborhood of t. With a more accurate es- 
timation and approximation of fi{t) plus a proper weighting scheme, one 
can presumably construct a new estimator of the regression parameters. 
The details of the construction are as follows. Set N{s) = (1/re) J27=i 
b{t) = E[exp{P'Z{t)}\Y>t], 

™b(t) E[w{t)Z{t)exp{P'Z{t)}Y{t)] 
W{ > exp{P'Z(t)} + mb(ty 9{ ' E[w{t)exp{P'Z{t)}Y{t)} ' 

and let u>i{t) be the i.i.d. copies of w{t). Let tp n {x) = ip{n l ^x), x E (— oo, oo), 
where tp is an infinitely differentiable nonnegative even function with bounded 
support. For ease of notation, suppose the support of tp is (—1, 1). Use 

. ^J2 jeR ,exp{P' P Z j {s)}^ n {t-s)dN{s) 
(3.1) b{t) = 



mfo i>n(t- s)dN(s) 
to approximate b(t) and 

mb{t) 



(3.2) wAt) = 

ex V {p P Zi(t)} + mb(t) 

to approximate Wi(t), i = 1, 2, Let 

Sk(t,0)= [ T E w :j (s)Z*(s)eMP'Z j ( s )}Mt-s)dN(s), k = 0,1,2. 

Throughout the paper, the power 2 on covariates Z or Zj,j > 1, always 
means the outer product (8)2. The notion A > B for any two d x d nonnegative 
definite matrices means that A — B is nonnegative definite. The proposed 
estimator, denoted by (3, is the solution of 

(3.3) U{0) = Y, / T ^^)-f^} d ^) = - 

We note that Si(t,(3)/So(t, (3) may be viewed as an estimator of g(t), a 
weighted version of Suppose, in the definition of Sk(t,f3), we use 1 

instead of the presently defined Wi as weights. Then Si(t, f3) / So(t, f3) is an 
estimator of fi(t) which should heuristically be more accurate than its coun- 
terpart in Up {p>). The present choice of weights is optimal in the sense that 
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no other choices will produce estimators of (3 with smaller asymptotic vari- 
ance; see further discussion in Section 5. As a result, (3.3) may produce 
more accurate estimators than Thomas' estimator. The difference between 
the weights Wi(t) used here and those used in Sasieni (1993b) is that Wi(t) 
depends on the covariate Zi(t) while those in Sasieni (1993b) do not. 

Denote by B c the closed ball in d dimensional real space centered at the 
origin with radius c > 0, where c is a large but fixed constant. Let C co {B c ) 
denote the set of all infinitely differentiable functions defined on B c . Some 
regularity conditions are assumed here: 

(i) pr{su Ptg[M \Z(t)\ <c} = l; 

(ii) the baseline hazard function Xo(t) is bounded away from and in- 
finity on [0, r] and has continuous second derivative; 

(iii) pr{Y(t) = 1} > for all t G [0, r]; 

(iv) Sp and £ defined in (3.5) are positive definite; 

(v) for any <£(•) G C°°(5 C ), E[<f>{Z (t)}Y (i)] as a function of t on [0,r] 
has continuous second derivative; 

(vi) for any <£(•) &C°°{B C ), the process n" 1 ^ ^^{^(t)} ^(t) -E[0{Z(t)}y(t)]), 
as a process of t, converges to a Gaussian process on [0, r] as n — > oo. 

Theorem. Assume the above conditions (i)-(vi) ZioZii. T/ien 

(3.4) n^^.^^iV^S- 1 ), 
where 

(3.5) S = f £Ri){Z(t) - g(t)}® 2 exp{(3'Z(t)}Y(t)]\ (t) dt. 

Jo 

Moreover, —(l/n)U(v)\ v= g is a consistent estimator o/S, where 

Remark. There are six conditions assumed in Goldstein and Langholz 
(1992) to ensure the consistency and asymptotic normality of j3p. Conditions 
(i)-(iv) here are analogous to Conditions 2-5 in Goldstein and Langholz 
(1992). Their Conditions 1 and 6 are implied in the model description in 
Section 2 and therefore are not listed here. The inheritance of their con- 
ditions is understandable since the proposed estimator (3 uses the Thomas 
estimator (3p. Conditions (i)-(iii) and (v) are necessary for using the empiri- 
cal approximations (e.g., the proof of part 1 of the Lemma in the Appendix) 
to obtain the rates of convergence for various random quantities, as in the 
Lemma in the Appendix. Condition (iii) here, parallel to Condition 4 in 
Goldstein and Langholz (1992), can be relaxed with increasing technicali- 
ties involving the tail behavior near the endpoint r. Condition (iv) validates 



.Sn(t.B) ISM. 3) 



^2wi(t)dNi(t). 
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the asymptotic normality of (5 claimed in (3.4). It is also used in proving the 
consistency of (3; see Step 3 of the proof of the Theorem in the Appendix. 
The requirement of differentiability in conditions (ii) and (v), for obtain- 
ing the bounds of kernel estimation, may appear to be restrictive in that 
it does not allow, for example, pr(Y > t) and E{Z(t)} to be discontinuous, 
although pathwise discontinuous Z(-) are not excluded. In fact, the differen- 
tiability requirements in conditions (ii) and (v) can be relaxed to piecewise 
differentiability. Then, functions such as pr(Y > t) and E{Z(t)} can be dis- 
continuous at a finite number of time points. The important case of fixed 
censoring is also covered. The proof of the Theorem under such relaxed con- 
ditions requires a careful but regular treatment on the edge effect caused 
by the discontinuity points and the endpoints, which is partly reflected in 
the Lemma. The current presentation was chosen to avoid a lengthy but not 
essential technical argument. In particular, condition (v) is satisfied when 
/ P{C > t\Z(t) = z}4>(z)ft(z) dC{z) as a function of t on [0,r] is piecewise 
twice continuously differentiable, where (/>(•) E C°°(B C ) and ft{z) is the den- 
sity of Z{t) with respect to a measure C which can be a combination of 
the Lebesgue measure and a counting measure. Condition (vi) is essentially 
about the tightness of the sequence, which can be ensured by a certain Lip- 
schitz condition on the increments Z(t) — Z{s). For example, it is satisfied if 
there exist an a > 1 and A > such that E(\Z(t) - Z(s)\ 2 ) < A\t - s\ a for all 
s, t £ [0, r] . More relaxed but technical conditions in terms of metric entropy 
may be found in Pollard (1990) or van der Vaart and Wellner (1996). Condi- 
tion (vi) is used to obtain uniform bounds on [0, r] for sequences of random 
processes in concern; see (A. 13). Again, a piecewise version of condition (vi) 
is sufficient. 

Remark. One can use another version of the weights Wi(t) by replacing 
ftp by P in the definitions of b and W{ in (3.1) and (3.2). The advantage is 
that one does not have to compute (5p first to obtain 0. The disadvantage 
is that the estimating equation (3.3) may possibly have multiple roots. Still, 
the same proof shows that one of the roots is consistent and asymptotically 
normal with the same asymptotic variance 

Proposition. Assume the conditions of the Theorem hold. Then Sp 1 > 
S" 1 and equality holds if and only if (3 = 0. 

This inequality justifies that the asymptotic variance of the proposed 
estimator is smaller than that of Thomas' estimator away from the null. 

4. A numerical example. Some simulation results are presented in this 
section. The covariate process is such that Z(t) = 4tu\ + U2, t € [0, 1], where 
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Table 1 

Summary of the simulation results'* 



f3 = 



/3 



Ave 
Est 



Emp 
Var 



Est 
Var 



Ave 
Est 



Emp 
Var 



Est 
Var 



Ave 
Est 



Emp 
Var 



Est 
Var 



Cen Prop 
Cox 

Thomas 
Proposed 

Thomas 
Proposed 

Thomas 
Proposed 



-0.004 



-0.003 
-0.003 



0.368 
0.013 



-0.003 0.026 
-0.004 0.025 



0.019 
0.019 



-0.001 0.017 
-0.001 0.017 



0.025 
0.027 

0.018 
0.019 

0.016 
0.017 



Fixed censoring: C = 1 



0.012 1.012 



1.039 
0.989 

1.034 
1.031 

1.034 
1.039 



0.603 
0.030 

m = 1 
0.089 
0.065 

m = 2 
0.059 
0.049 

m = 3 
0.054 
0.047 



0.029 

0.087 
0.081 

0.058 
0.056 

0.048 
0.048 



0.667 
2.025 0.067 



0.063 



2.127 0.316 0.299 

1.962 0.162 0.197 

2.096 0.189 0.174 

2.028 0.123 0.140 

2.073 0.142 0.134 

2.041 0.106 0.116 



Random censoring: C\Z ~ [7[0,min(l, |Z(0.25)|)] 



Cen Prop 
Cox 

Thomas 
Proposed 

Thomas 
Proposed 

Thomas 
Proposed 



0.771 

-0.007 0.059 0.056 1.009 



0.009 
-0.001 

-0.011 
-0.013 



0.138 
0.126 

0.094 
0.095 



0.123 
0.147 

0.087 
0.095 



1.106 
0.991 

1.063 
1.031 



0.849 
0.080 

m = 1 
0.336 
0.195 

m = 2 
0.189 
0.145 

m = 3 



0.843 

0.079 2.018 0.124 0.121 



0.301 
0.276 

0.176 
0.176 



2.231 
1.963 

2.159 
2.024 



0.845 
0.376 

0.481 
0.256 



0.880 
0.453 

0.435 
0.300 



-0.003 0.082 0.078 1.069 0.154 0.144 2.137 0.370 0.334 
-0.006 0.082 0.082 1.057 0.133 0.147 2.043 0.205 0.243 



a "Ave Est" and "Emp Var" stand for the averages and empirical variances of the estimates 
over 2000 simulations. "Est Var" stands for the average of the estimated variances over 
2000 simulations. "Cen Prop" stands for the proportion of censoring. "Cox," "Thomas" 
and "Proposed" refer, respectively, to the Cox estimate based on full cohort data, Thomas' 
estimate and the proposed estimates based on time-restricted n-c-c data. 



u\ and U2 are two independent random variables uniformly distributed on 
[— 1, 1]. The baseline hazard function Ao(t) is set to be constant at 1. We con- 
sider separately two different types of censorship: the fixed censoring with 
pr(C = 1) = 1, and the random censoring with the conditional distribution 
of C given that Z is the uniform distribution on [0, min(l, |Z(0.25)|)]. The 
function ip n (t) is chosen to be I(\t\ < 0.05). The regression parameter (3 takes 
values 0, 1 and 2 and the size of controls to be selected from each risk set is 1, 
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2 and 3. The sample size is 200. For each scenario, 2000 simulations are con- 
ducted and Thomas' estimator and the proposed estimator are calculated. 
For reference, we also calculate Cox's partial likelihood estimator based on 
full cohort data. The results are presented in Table 1. Table 1 shows that the 
proposed estimate has indeed smaller variances (and mean squared errors as 
well) than Thomas' estimate if (3 ^ 0. When (3 = 0, the two estimates have 
about the same asymptotic variances. These simulation results are consis- 
tent with the Theorem and the Proposition. In this example, when (3 = 2 
the bias of Thomas' estimate appears to be relatively serious, while that 
of the proposed estimate is always negligible. We also notice that, in a few 
cases of this example, the variance estimation appears to be biased down for 
Thomas' estimate and biased up for the proposed estimate. Typically, the 
latter will result in conservative but still valid inferences. As sample size in- 
creases, the bias tends to be negligible. It is concluded that this simulation 
example provides solid evidence in support of the established theoretical 
results. 

5. Closing remarks. In summary, the proposed estimate (3 is asymptot- 
ically more accurate than Thomas' estimate away from the null and it uses 
only the time-restricted n-c-c data which is strictly necessary for Thomas' 
estimate. This estimate is relatively easy to compute: The estimating equa- 
tion takes a simple closed form and has a unique solution. Its inference is 
equally easy to obtain as the variance estimate is simply minus the inverse 
of the derivative of the estimating function. Unlike the case of curve esti- 
mation, the problem of (optimal) bandwidth choice is much less significant 
here. In the definition of ip n , the order of bandwidth is n -1 / 3 . In fact, with 
little modification of the proof, the Theorem holds for all bandwidths of 
order n~ r with r £ (1/4, 1/2). At least within this range, the choice of band- 
width does not affect the first-order asymptotic behavior of the estimate. 
In practice, however, a proper objective or data-driven choice of bandwidth 
should be valuable for the implementation of the estimation procedure. 

Although it is not clear whether the proposed estimator is semiparametric 
efficient based on the time-restricted n-c-c data, it does have the following 
optimality. Consider the class of estimators as solutions of 



where h is any bounded infinitely differentiable function. Heuristically, if 
h{Zj(t),t, f3} is replaced by Wj(t){Zj(t) — S\(t, (3)/So(t, (3)}, it can be shown 




i=l 



Jo Ejggi h{Zj(s),s, (3} expjP'Zj (s)}Mt ~ g) dN(s) 



<LNi(t) = 0, 
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that the above estimating function is Ya=i Jo h{Zi(t),t, f3} dNi(t) + op(n 1 / 2 ) . 
Hence the above equation with this particular choice of h is asymptotically 
equivalent to (3.3) in the sense that the resulting estimators of (3 are asymp- 
totically equivalent. In this asymptotic sense, (3.3) might be viewed as ap- 
proximately a member of the above general class of estimating equations. 
More important, it can be proved under regularity conditions that the theo- 
retical optimal choice of h{Zj(t),t,(3} is Wj(t){Zj(t) — g(t)}, which in actual 
construction is approximated by Wj(t){Zj(t) — Si(t,(3)/So(t,(3)}. It implies 
that no other choice of h used in the above estimating equation shall re- 
sult in estimators of (3 with asymptotic variance smaller than S _1 and that 
no other choice of weights used in (3.3) will produce estimators of (3 with 
asymptotic variance smaller than 



APPENDIX 

Proofs of the Theorem and the Proposition. More notation is needed. 
Throughout the Appendix, the notion | • | for a vector or matrix means the 
sum of the absolute values of all elements. Set n t = Ya=i iQ^i > /(*) = 
E[exp{P'Z(t)}Y(t)]X (t) and h k (t) = E[w{t)Z k {t) exp{(3'Z(t)}\Y > t], k = 
0,1,2. Notice that together conditions (i), (ii) and (v) ensure that f(t) is 
bounded above, bounded above and has a continuous second derivative on 
[0,t]. Much of the proof relies on counting process martingale techniques; 
see, for example, Andersen and Gill (1982). The following lemma provides 
the approximations used in the proof of the Theorem. 



ber. 



Lemma. Assume conditions (i)-(vi) hold. Let e > be an arbitrary num- 



1. Let f n {t) = n l /3E{^j n {t - Y)5} and a n {t) = n 1 / 3 ^{5(y - t)tp n (Y - t)}. 
Then 



(A.l) 



sup 

0<t<T 



n 1 / 3 [ Mt-s)dN(s)- f n (t) 



(A.2) sup 

0<t<T 



n V3 / ( s _ t ty n ( t _ s ) d jV(s) - a n (t) 



-■0 P {n~ l l z+£ ) 
:0 P (n- 2 / 3+e ) 



Moreover, / n .(-) and a n (-) are continuous on [0, r], satisfying 
< inf i 

(A.3) 



< inf inf f n (t) < sup sup f n (t) < oo, 
n te[0,r] ' n te[0>r ] 



sup \a n (t)\=0(n- 1 / 3 ), 
te[o,r] 



and f n (t)=f(t) and a n (t)=0 for t £ (n -1 / 3 ^ - n" 1 / 3 ). 
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2. Let Sk(t,f3) be defined the same as Sk(t,(3), k = 0,1,2, except with Wi 
replaced by Then 



(A.4) 
(A.5) 

(A.6) 

(A.7) 
(A.8) 

(A.9) 



sup \b(t)-b(t)\ = Op(n-^ s+£ ); 

te[0,r] 

sup sup \wi(t) - Wi(t)\ = Op(n~ 1/3+£ ); 

l<i<nt£[0,r] 



sup 

t€[0,T 



S k (t,{3)-S k (t,P) 



sup 

tG[0,r 



mf T ijj n (t-s)dN(s) 

s k (t,p) 



P (n 



-1/3+q 



m J r VV).^ — s) dN(s) 
- E[w{t)Z k {t) exp{/3'Z(t)}|Y > t] 
Si(*,j9) 



sup 

te[o,r 



sup 

sS[0,t] 



- / tp n (t-s) 

n Jo 



S (t,P) 
Xo(s)P(Y>s) 



Op(n-^ s+£ ); 
P (n- 1 / 3+e ); 



mf n (s) 



where Xo(s)P(Y > s)/{mf n (s)} = l/{mb(s)} for s G (n 1 / 3 ,r — n 1//3 ). 
Moreover, (A.7)-(A.9) a/so ZioW if Sk(t,f3) is replaced by Sk(t,P). 

Proof. 1. Observe that var{ip n (t-Y)5} < f Q T ^l{t- s)f{s) ds = 0{n~ l / z ) 
and write 



n 



1/3 



n 1/3 / Mt-s)dN(s)-f n (t) 



n 



n 

1/3 



- Yi)8i - E{if> n (t - Y)S}] 



i=i 



Set M as a large but fixed number. It follows from Bernstein's inequality 
[see, e.g., van der Vaart and Wellner (1996), page 102] that 



P rT 1/3 



i=i 



>n £ \ < 2n 



-M 
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for all large n. Set A n = {kr/n 2 :k = 0, 1, . . . ,n 2 }. The above exponential 
inequality ensures, through the Borel-Cantelli lemma, that 



sup n 



-1/3 



J2t>Pn(t - Y{]5i - E{i> n (t ~ Y)6}} 



1=1 



0(n e 



almost surely. Extending the supremum over A n to over [0, r], the above 
equality still holds by the differentiability of the kernel function ip. Therefore 
(A.l) holds. (A. 2) can be proved in a similar fashion. (A. 3) and the rest of 
the claims can be verified by direct calculation using Taylor expansion. 

2. Set dj(t) = [exp{P'Zj(t)} - b{t)]Y j (t+). Observe (A.1)-(A.3) and that 
(3p — (3 = Op(n~ 1 / 2 ). One can apply Taylor expansion and write 

b(t)-b(t) 



mfoMt-s)dN(s) 



(A.10) 



Y, bMP'pZj(s)} - b(s)]Mt - s) dN(s) 
+ m f T {b(s) - b(t)}^ n (t - s) dN(s) 



mfo ipn{t- s)dN(s) 



Uo 



[expifaZjis)} - exp{/?'Z j ( S )}]y,( S +)^n(i - s)dN(s) 



+ f V [exptfZjia)} - 6(a)]^(a+)V„(t - s)dN(s) 

+ mb(t) [ (s-t)ip n (t-s)dN(s) 



+ Op(n~ 2 / 3 ) 

P - /3)'E[Z(t) exp{p'Z(t)}\Y > t] 



Y\ i{j e K) - ^ \ d^n^Mt - s) dN(s) 



mf n (t) Jo p{ 

+ lfn{t)P{Y > t) i:J d j{ sw^ n{ t-s)ns)d S 



n i 



+ b(t)a n (t)/f n (t)+op(n~ 1 / 2 ). 
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Here and in the following, Op(-) and op(-) are uniform over t € [0, r]. The 
first term in the last expression is Op(n~ 1 / 2 ) by the asymptotic normality of 
Pp established in Goldstein and Langholz (1992). Let T denote the a-algebra 
generated by [{Yi,5i,Zi(-)},i= 1,2,...]. The integrands of the second term 
are conditionally independent with conditional mean zero when conditioning 
on T . Thus, the empirical approximation analogous to the proof of (A.l) 
can be applied to show the second term is Op(n _1 / 3+£ ). Since dj(t) is uni- 
formly bounded with mean zero, the third term can be similarly shown to 
be Op(?i -1 / 3+£ ). The fourth term is Op(n, _1 / 3+e ) by part 1. Therefore (A. 4) 
is proved. 

To show (A. 5), apply the mean value theorem and write 

Wi(t) - Wi(t) 

(A.ll) 

= Wi(t){l - Mt)}[{b(t) ~ Kt)} - 0' P - P)Zi{t)\ + opin' 1 / 2 ), 

where op(-) is uniform over [0, r]. Then (A. 5) follows from (A. 4), the bound- 
edness of Z(-) in condition (i) and the asymptotic normality of (3p. 

Equation (A. 6) follows directly from (A. 5) and the definitions of S k (t,(3) 
and S k (t,(3). 

To show (A.7), let h k (t) = (l/n t ) £™ =1 Wj(t)Zf(t) exp{(3' Z j (t)}Y j (t+) and 
recall the definition of /ifc(-)- Conditions (v) and (vi) imply that sup tg r 0iT ) \hk(t) ■ 
h k (t)\ = Opin- 1 / 2 ). Therefore we can write 



mf£ip n (t - s)dN(s) 

_ fo Ejefl; [tijWZfr) expiPZ^s)} - h k {t)}^ n (t - s) dN{s) 

mJoMt-s)dN(s) 
= J^^R i [^)Z^s)e W {(3'Z J (s)} - h k (s)]Mt ~ s)dN(s) 
m Io i>n{t- s)dN(s) 

+ P (n~ 1 / 3+£ ) 

_ JoE jeRt [n' 3 (s)Zf(s)e W {(3 / Z J (s)} - h k {s)}j> n {t - s)dN(s) 
m So ipn(t- s)dN(s) 

+ Op(^ 1/3+£ ), 

where the order Op(-) is uniform over t € [0, r). Notice that the integrands 
in the numerator are bounded conditionally independent with conditional 
mean zero when conditioning on T . Then (A.7) can be shown by applying 
the empirical approximation analogous to the proof of (A.l). 

Equation (A. 8) follows from the definition of g(-), (A. 6) and (A.7). 
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To show (A.9), recall that h (t) is defined as E[w{t)exp{/3'Z(t)}\Y > t]. 
Use (A.l) and (A. 7) and write 

YZ=iWi(t)dN % (t) 



- / 1pn(t ~ S) 

n Jo 



n — Jo 



S (t,f3) 

nV^njt-s^iWdNijt) 
mf n (t)h (t) 



+ P (n 



1 n 1,z ipn{Yj- s)wj{Yi)5i + 0p ( n i/3+s^ 



(A.12) 



■n 



E 



mUY^hoiYi) 
n^tpniY - s)w(Y)S 



mf n (Y)h (Y) 
n x ^ n (t-a)Xo(t)P(Y>t) 



Xo(s)P(Y > s) 
mf n (s) 



mf n (t) 
+ P (n 



+ P (n~ 1 / 3+£ ) 

dt + Opin- 1 ^ 6 ) 



-l/3+e 



)■ 



where Op(-) is uniform over s € [0, r]. In the above equations, the third 
equality can be proved analogous to the proof of (A.l). The details are omit- 
ted. That X (s)P(Y > s)/{mf n {s)} = l/{mb(s)} for s € (n -1 / 3 , r — n -1 / 3 ) 
follows from the result of part 1 and the definitions of b(-) and f n (')- 

Equations (A.7)-(A.9) also hold if Sk{t,f3) is replaced by Sk(t,@) because 
of (A. 6). The proof of this lemma is complete. □ 

Proof of the Theorem. Define M(t) = N{t) -J *exp{/3'Z(s)}y(s) x 
Aq(s) ds. Let Mi{t),i = 1,2, ... , be the i.i.d. copies of M(t), 



9n(t) 



£?=i Wi(t)Zj(t) exp{/3'Z l (t)}Y i (t+) 
E?=iWi(t)exp{(3'Z i (t)}Y i (t+) 



and 



Wi(t). 



Then condition (vi) implies that, for any e n J. 0, 



(A.13) sup |Sn(*)-Sn(s)-9(*)+ 5(^)1 =op(n~ 1 / 2 ). 

|t-s|<£n,0<t,S<T 



The rest of the proof is divided into four steps. 
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Step 1 [To show n~ l l 2 U{(3) -> JV(0,E)]. Apply (A.l) and write 



U(P) = E f T Mt){Zi(t) -9n(t-)}dNi(t) 
i=1 Jo [bo{t,P) J i=1 



V [ T Wi(t){Zi(t) - g n (t-)}dMi(t) 
Jo Jo f~t, 



xexp{P'Z j (s)}i; n (t-s)dN(s) ^jr]™'® dN { (t) 



S {t,P) 



i=l 



E / Wi{t)[Zi{t) - g n {t-)}dMi{t) 



E *i( s H Z i( s ) -9n{s)} 

x exp{/3'Zj(s)}i> n (t - s)dN(s) 



E? =1 u>i(*W(*) 



f / T {5n(5)-ffn(t-)} E Wj^expiP'Z.is)} 
Jo Jo 777,* 

dN(s) 



Mt a) £>(i)cUV«(i) 



= Hi + H 2 + H 3 , say. 

We first show H3 = op(n 1//2 ). In view of (A. 13), it is seen that H3 differs 
by a term of order op{n 1 / 2 ) when g n (s) — g n (t—) is replaced by g(s) — g(t—). 
Notice that condition (v) ensures the differentiability of g(-). Therefore 

sup{\g(s) - g(t)\Mt -s):t,se [0, r)} = 0(n^) 

since has bounded support. Then, using the delta method, H3 can be 
reduced to 

" 4/3 f T [ T {g(s)-g(t)}Mt-s)dsE[w(t)exp{f3'Z(t)}Y(t)}X (t)dt 
Jo Jo 

+ o P (n 1 / 2 ) 
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= 0{n 1 l' i )+o P {n 1 l 2 ) 

= o P (n 1 / 2 ) 

by condition (v) and the result of part 1. 

We next show the asymptotic normality of S2. Recall that J- is the a- 
algebra generated by [{Yi,5i, Zi(-)} :i = 1,2,...]. For every t, the integral 
in the brackets, conditioning on T, has mean and, when normalized by 
Jo ^n^t — s) dN(s), can be shown to be Op{n~ 1 ^ +e ) uniformly over t E [0, r] 
for any fixed e > 0. Thus, in view of (A. 9), the delta method can be applied 
to show that H2 is 



/ J2 tij(s){Zj(s) ~9n(s)} 



x e3qp{P' Zj(s)} 



T 1pn(t — s) 



s (t,p) tri 



dN(s) 



*i( s ){ Z i( s ) -9n(s)} 

1 n 

x expipfZjis)}——, Y, dN ^ s ) + Mn 1/2 ), 
mb(s) f— * 

where the main term, conditioning on T, is the sum of bounded random 
variables with conditional mean 0. Therefore it converges at the rate n 1 / 2 
to a normal distribution with mean and asymptotic variance 

w(t) 2 



E 



mb{t) 



{Z(t)-g(t)}® 2 exp{2P'Z(t)}Y(t) 



\ Q (t)dt 



E[w(t){l - w(t)}{Z(t) - g(t)}® 2 eMP'Z(t)}Y(t)]X (t) dt. 
Consider the first term Hi and write 

h = J2 f T Wi(s)[Zi(a) - g n (s-)]dMi(s) 

= E f T Ms)[Zi(s) - 9(s)]dMi(t) -J2 [ T Wi{s)[g n (s-) - g(s)}dMi(t) 



8=1 

n 



8=1 



V f Wi(s)[Zi(s) - g(s)} dMi(t) + opin 1 ' 2 ) 



The main term in the above expression is the sum of i.i.d. bounded random 
variables with mean and variance Si = var[/J~ w(t){Z(t) — g(t)} dM(t)\. 
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Therefore Hi converges at the rate n 1 / 2 to a normal distribution with mean 
and variance Si defined above. 

Last, combine the limits of the terms Hi, H2 and H3, and notice that Hi is 
^"-measurable and that the asymptotic normality for H2 holds conditioning 
on T. Observe that S a = E[w{s) 2 {Z{s) - g{s)}® 2 exp{/?' Z{s)}Y(s)]X {s) ds. 
It follows that 

n-^Hi + H 2 + S 3 ) -> JV(0, S + Si) = N(0, S). 



Step 2 [To show n- l / 2 U{(3) -> N(0, S)] . The following equalities use the 
approximations (A.5)-(A.9) and the delta method: 

U(0)-U{p) 



t=i 



E / HW-^W) W) 



U7j 



5 (t,/3) 

Si(t,/3) Si (*,/?) 
SoM) 5 (t,/3) 



dNi(t) 



i=l 



Wi(t) 



S o (t,0) 
Si(t,p) 



S 2 (t,P) 



{S Q (t,p)-S Q (t,P)} 



dNi(t) + opin 1 ' 2 ) 



E - Mt)}{Zi(t) - g(t)}) dNi(t) 



S (t,P) 



g(t){S (t,(3)-S (t, (3) }] E^(*) (*) + op ( 



1/2 N 



1=1 

E A^C*) - «*(*)} W) - g(t)}dNi(t) 



E {^j (*)-%(«)} 



x - ff (t)}exp{/?%( S )}Y J ( S +) 
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X / ip n (t-s) 

Jo 



Yd=i Wi(t)dNi(t) 



S (t,P) 

!\w i (t)-w i {t)}{Z i {t)-g{t)}dM i (t) 



dN(s) + o P (n 



l/2 } 



1 

m Jo 



j'=i L 



m 



x {w j (s)-w j (s)}{Z j (s) -g(s)} 



xexp{P'Z j (s)}Y j (s+) 



n 



b(s) 



dN(s) + o P (n 1/2 ). 



The two main terms are also of order op(n 1 ^ 2 ), by observing the expressions 
(A. 10) and (A. 11) and an argument along the line of the proof of Theorem 4.1 
of Sasieni (1993a). Hence U{j3) - U{(3) = o P (n x l 2 ) and n 1 / 2 f/(/3) -► JV(0,E) 
follows from Step 1. 

Step 3 [To show the consistency of $]. Let B(/3,Eq) be the ball in M. d 
centered at j3 with radius £o > 0. In view of (A.6)-(A.8), one can show that, 
as n — ► oo and then eq — > 0, 



(A.14) 



sup | — U(v)/n — S| 

veB(J3,e ) 







in probability, where v may be different for different elements of the matrix 
U(-). Now, choose any small but fixed eo > and view —U(-)/n as a random 
mapping from R d to R d . Then, since S is assumed to be positive definite 
and thus invertible in condition (iv), (A.14) implies that, with probability 
tending to 1, the mapping —U(-) jn is a homeomorphism from B(/3, eo) to its 
image, denoted as B n , which contains a ball of fixed radius. Since U(/3)/n = 
Op(n -1 / 2 ) as proved in Step 2, B n contains 6 with probability tending 
to 1. Therefore, (3, as the unique solution of U(-) = 0, is in the ball B((3,eo) 
with probability tending to 1. The consistency of (3 is proved since eo is 
arbitrary. 

Step 4 [To show n 1 / 2 (/3 - (3) -> A^ir 1 )]. It follows from the mean 
value theorem that 



-U(P) = U(J3)-U{P) = U(v)(J3-P), 



where v lies on the line segment joining (3 and (3 but may be different for 
different elements of the matrix U(-). Then the desired asymptotic normality 
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of (3 follows from (A. 14), the consistency of (3 proved in Step 3 and the 
asymptotic normality of U{(3) proved in Step 2. The proof is complete. □ 

Proof of the Proposition. Define ??(£) = {1 - w(t)}{Z(t) - g(t)} + 
{g(t) — n(t)}/(m + 1). Let r]i(t),i = 1, . . . , m + 1, be the i.i.d. copies of rj(t). 
The following calculations use the fact that E[{Z{t) — exp{(3' Z(t)}\Y > 

t] = E[w(t){Z(t)-g(t)}exp{f3'Z(t)}\Y>t]=E[{l-w(t)}{Z(t)- 9 mY> 
t] = 0. Write 

'm+l m+l \ 

Y 1 >t,...,Y m+1 >t 



1 



771 + 



-e[ E m(t) £ [{zj(t) -»'(t)}eMP'Zj(t)}] 

\ k=l j=l 

= E[r)(t){Z'(t) - fj,'(t)} exp{f3'Z(t)}\Y > t] 

= E[{1 - w(t)}{Z(t) - g(t)}{Z'(t) - //(*)} exp{P'Z(t)}\Y > t] 

= E[{Z(t)-»(t)}® 2 exp{P'Z(t)}\Y>t] 

- E[w(t){Z{t) - g(t)}® 2 exp{(3'Z(t)}\Y > t) 
= H l {t)-H 2 {t), say. 



Similarly, 



1 



m+l 



-E 



( m+l 



®2 m+ i 



E %(*) E exp{/?%(*)} 



L fc=i 



i=i 



Yx>t,...,Y m+1 



>t 



E[r,{t)® 2 exp{f]'Z(t)}\Y >t] + mE{r,{t) m \Y > t}b(t) 
+ 2mE[r)(t) exp{(3' Z(t)}\Y > t]E{q'(t)\Y > t} 
+ m(m - l)[E{ v (t)\Y > t}f 2 b(t) 
E( V (t)® 2 [exp{/3'Z(i)} + mb(t)) \Y >t) 



in 



m+l 



-1 + 



1 



m+l 



+ 



m — 1 
m + l 



{g(t) - n(t)r 2 b(t) 



E 



77(t)® 2 exp{/?'Z(t)} 



l-w{t) 



Y>t 



m 



m+l 



{Bit) - n(t)} m Kt) 



E[{1 - w(t)}{Z(t) - g(t)}® 2 exp{P'Z(t)}\Y > t] 

- {g(t) - /,(*)}*»&(*) 

E[{Z(t) - »(t)}® 2 exp{P'Z(t)}\Y >t] 

- E[w(t){Z(t) - g(t)}® 2 exp{f3'Z(t)}\Y > t] 
H x {t)-H 2 {t). 
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Then it follows from (the matrix version of) the Cauchy-Schwarz inequality 
that 



i 



■E 



■ES 1 {^-(t)-/'W}^-(i)}F 



m + 1 

>fl-i(t)-fr 2 (t), 

and equality holds if and only if 



Fi >t,...,y m+ i 



pr 



m t ) = h ( t )— 

■3=1 



E"Li exp{/3'^(t)} 



Fi >t,...,l r m +i>t 



1. 



where is a nonrandom function. This equality holds only when the 
conditional distribution of Ejii 1 ex P{/3'^ (i)} 

given Y\ > t, . . . , Y m ^-i ^ i is 
degenerate. If the above equality holds for all f G [0,r] except for a Lebesgue 
measure set, then /3 = 0. Observe that Sc = /J" iii(t) pr(y > t)Xo(t)dt, 
E = J T F 2 (t) pr(y > t)Ao(i) and S a = J Q r ^(t) pr(F > t)A (t) dt. Then it 



follows that S a > Sc — E, or, equivalently, Sj 



S 1 if and only if j3 = 0. The proof is complete. 



■s a )- 

□ 



> S" 



And 
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